352 research outputs found

    Ab Initio Exon Definition Using an Information Theory-based Approach

    Get PDF
    Transcribed exons in genes are joined together at donor and acceptor splice sites precisely and efficiently to generate mRNAs capa ble of being translated into proteins. The sequence variability in individual splice sites can be modeled using Shannon information theory. In the laboratory, the degree of individual splice site use is inferred from the structures of mRNAs and their relative abundance. These structures can be predicted using a bipartite information theory framework that is guided by current knowledge of biological mechanisms for exon recognition. We present the results of this analysis for the complete dataset of all expressed human exons

    Multigene signatures of responses to chemotherapy derived by biochemically-inspired machine learning.

    Get PDF
    Pharmacogenomic responses to chemotherapy drugs can be modeled by supervised machine learning of expression and copy number of relevant gene combinations. Such biochemical evidence can form the basis of derived gene signatures using cell line data, which can subsequently be examined in patients that have been treated with the same drugs. These gene signatures typically contain elements of multiple biochemical pathways which together comprise multiple origins of drug resistance or sensitivity. The signatures can capture variation in these responses to the same drug among different patients

    BIPAD: A web server for modeling bipartite sequence elements

    Get PDF
    BACKGROUND: Many dimeric protein complexes bind cooperatively to families of bipartite nucleic acid sequence elements, which consist of pairs of conserved half-site sequences separated by intervening distances that vary among individual sites. RESULTS: We introduce the Bipad Server [1], a web interface to predict sequence elements embedded within unaligned sequences. Either a bipartite model, consisting of a pair of one-block position weight matrices (PWM's) with a gap distribution, or a single PWM matrix for contiguous single block motifs may be produced. The Bipad program performs multiple local alignment by entropy minimization and cyclic refinement using a stochastic greedy search strategy. The best models are refined by maximizing incremental information contents among a set of potential models with varying half site and gap lengths. CONCLUSION: The web service generates information positional weight matrices, identifies binding site motifs, graphically represents the set of discovered elements as a sequence logo, and depicts the gap distribution as a histogram. Server performance was evaluated by generating a collection of bipartite models for distinct DNA binding proteins

    Discovery and validation of information theory-based transcription factor and cofactor binding site motifs.

    Get PDF
    Data from ChIP-seq experiments can derive the genome-wide binding specificities of transcription factors (TFs) and other regulatory proteins. We analyzed 765 ENCODE ChIP-seq peak datasets of 207 human TFs with a novel motif discovery pipeline based on recursive, thresholded entropy minimization. This approach, while obviating the need to compensate for skewed nucleotide composition, distinguishes true binding motifs from noise, quantifies the strengths of individual binding sites based on computed affinity and detects adjacent cofactor binding sites that coordinate with the targets of primary, immunoprecipitated TFs. We obtained contiguous and bipartite information theory-based position weight matrices (iPWMs) for 93 sequence-specific TFs, discovered 23 cofactor motifs for 127 TFs and revealed six high-confidence novel motifs. The reliability and accuracy of these iPWMs were determined via four independent validation methods, including the detection of experimentally proven binding sites, explanation of effects of characterized SNPs, comparison with previously published motifs and statistical analyses. We also predict previously unreported TF coregulatory interactions (e.g. TF complexes). These iPWMs constitute a powerful tool for predicting the effects of sequence variants in known binding sites, performing mutation analysis on regulatory SNPs and predicting previously unrecognized binding sites and target genes

    Splicing mutation analysis reveals previously unrecognized pathways in lymph node-invasive breast cancer.

    Get PDF
    Somatic mutations reported in large-scale breast cancer (BC) sequencing studies primarily consist of protein coding mutations. mRNA splicing mutation analyses have been limited in scope, despite their prevalence in Mendelian genetic disorders. We predicted splicing mutations in 442 BC tumour and matched normal exomes from The Cancer Genome Atlas Consortium (TCGA). These splicing defects were validated by abnormal expression changes in these tumours. Of the 5,206 putative mutations identified, exon skipping, leaky or cryptic splicing was confirmed for 988 variants. Pathway enrichment analysis of the mutated genes revealed mutations in 9 NCAM1-related pathways, which were significantly increased in samples with evidence of lymph node metastasis, but not in lymph node-negative tumours. We suggest that comprehensive reporting of DNA sequencing data should include non-trivial splicing analyses to avoid missing clinically-significant deleterious splicing mutations, which may reveal novel mutated pathways present in genetic disorders

    Interpretation of mRNA splicing mutations in genetic disease: review of the literature and guidelines for information-theoretical analysis.

    Get PDF
    The interpretation of genomic variants has become one of the paramount challenges in the post-genome sequencing era. In this review we summarize nearly 20 years of research on the applications of information theory (IT) to interpret coding and non-coding mutations that alter mRNA splicing in rare and common diseases. We compile and summarize the spectrum of published variants analyzed by IT, to provide a broad perspective of the distribution of deleterious natural and cryptic splice site variants detected, as well as those affecting splicing regulatory sequences. Results for natural splice site mutations can be interrogated dynamically with Splicing Mutation Calculator, a companion software program that computes changes in information content for any splice site substitution, linked to corresponding publications containing these mutations. The accuracy of IT-based analysis was assessed in the context of experimentally validated mutations. Because splice site information quantifies binding affinity, IT-based analyses can discern the differences between variants that account for the observed reduced (leaky) versus abolished mRNA splicing. We extend this principle by comparing predicted mutations in natural, cryptic, and regulatory splice sites with observed deleterious phenotypic and benign effects. Our analysis of 1727 variants revealed a number of general principles useful for ensuring portability of these analyses and accurate input and interpretation of mutations. We offer guidelines for optimal use of IT software for interpretation of mRNA splicing mutations

    Localized, Non-Random Differences in Chromatin Accessibility Between Homologous Metaphase Chromosomes

    Get PDF
    BACKGROUND: Condensation differences along the lengths of homologous, mitotic metaphase chromosomes are well known. This study reports molecular cytogenetic data showing quantifiable localized differences in condensation between homologs that are related to differences in accessibility (DA) of associated DNA probe targets. Reproducible DA was observed for ~10% of locus-specific, short (1.5-5 kb) single copy DNA probes used in fluorescence in situ hybridization. RESULTS: Fourteen probes (from chromosomes 1, 5, 9, 11, 15, 17, 22) targeting genic and intergenic regions were developed and hybridized to cells from 10 individuals with cytogenetically-distinguishable homologs. Differences in hybridization between homologs were non-random for 8 genomic regions (RGS7, CACNA1B, GABRA5, SNRPN, HERC2, PMP22:IVS3, ADORA2B:IVS1, ACR) and were not unique to known imprinted domains or specific chromosomes. DNA probes within CCNB1, C9orf66, ADORA2B:Promoter-Ex1, PMP22:IVS4-Ex 5, and intergenic region 1p36.3 showed no DA (equivalent accessibility), while OPCML showed unbiased DA. To pinpoint probe locations, we performed 3D-structured illumination microscopy (3D-SIM). This showed that genomic regions with DA had 3.3-fold greater volumetric, integrated probe intensities and broad distributions of probe depths along axial and lateral axes of the 2 homologs, compared to a low copy probe target (NOMO1) with equivalent accessibility. Genomic regions with equivalent accessibility were also enriched for epigenetic marks of open interphase chromatin (DNase I HS, H3K27Ac, H3K4me1) to a greater extent than regions with DA. CONCLUSIONS: This study provides evidence that DA is non-random and reproducible; it is locus specific, but not unique to known imprinted regions or specific chromosomes. Non-random DA was also shown to be heritable within a 2 generation family. DNA probe volume and depth measurements of hybridized metaphase chromosomes further show locus-specific chromatin accessibility differences by super-resolution 3D-SIM. Based on these data and the analysis of interphase epigenetic marks of genomic intervals with DA, we conclude that there are localized differences in compaction of homologs during mitotic metaphase and that these differences may arise during or preceding metaphase chromosome compaction. Our results suggest new directions for locus-specific structural analysis of metaphase chromosomes, motivated by the potential relationship of these findings to underlying epigenetic changes established during interphase

    Likely community transmission of COVID-19 infections between neighboring, persistent hotspots in Ontario, Canada [version 1; peer review: awaiting peer review]

    Get PDF
    This study aimed to produce community-level geo-spatial mapping of confirmed COVID-19 cases in Ontario Canada in near real-time to support decision-making. This was accomplished by area-to-area geostatistical analysis, space-time integration, and spatial interpolation of COVID-19 positive individuals

    Improved radiation expression profiling in blood by sequential application of sensitive and specific gene signatures

    Get PDF
    Purpose. Combinations of expressed genes can discriminate radiation-exposed from normal control blood samples by machine learning based signatures (with 8 to 20% misclassification rates). These signatures can quantify therapeutically-relevant as well as accidental radiation exposures. The prodromal symptoms of Acute Radiation Syndrome (ARS) overlap those present in Influenza and Dengue Fever infections. Surprisingly, these human radiation signatures misclassified gene expression profiles of virally infected samples as false positive exposures. The present study investigates these and other confounders, and then mitigates their impact on signature accuracy. Methods. This study investigated recall by previous and novel radiation signatures independently derived from multiple Gene Expression Omnibus datasets on common and rare non-malignant blood disorders and blood-borne infections (thromboembolism, S. aureus bacteremia, malaria, sickle cell disease, polycythemia vera, and aplastic anemia). Normalized expression levels of signature genes are used as input to machine learning-based classifiers to predict radiation exposure in other hematological conditions. Results. Except for aplastic anemia, these blood-borne disorders modify the normal baseline expression values of genes present in radiation signatures, leading to false-positive misclassification of radiation exposures in 8 to 54% of individuals. Shared changes, predominantly in DNA damage response and apoptosis-related gene transcripts in radiation and confounding hematological conditions, compromise the utility of these signatures for radiation assessment. These confounding conditions (sickle cell disease, thromboembolism, S. aureus bacteremia, malaria) induce neutrophil extracellular traps, initiated by chromatin decondensation, DNA damage response and fragmentation followed by programmed cell death. Riboviral infections (for example, Influenza or Dengue fever) have been proposed to bind and deplete host RNA binding proteins, inducing R-loops in chromatin. R-loops that collide with incoming replication forks can result in incompletely repaired DNA damage, inducing apoptosis and releasing mature virus. To mitigate the effects of confounders, we evaluated predicted radiation-positive samples with novel gene expression signatures derived from radiation-responsive transcripts encoding secreted blood plasma proteins whose expression levels are unperturbed by these conditions. Conclusions. This approach identifies and eliminates misclassified samples with underlying hematological or infectious conditions, leaving only samples with true radiation exposures. Diagnostic accuracy is significantly improved by selecting genes that maximize both sensitivity and specificity in the appropriate tissue using combinations of the best signatures for each of these classes of signatures

    Context-based FISH localization of genomic rearrangements within chromosome 15q11.2q13 duplicons

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Segmental duplicons (SDs) predispose to an increased frequency of chromosomal rearrangements. These rearrangements can cause a diverse range of phenotypes due to haploinsufficiency, in <it>cis </it>positional effects or gene interruption. Genomic microarray analysis has revealed gene dosage changes adjacent to duplicons, but the high degree of similarity between duplicon sequences has confounded unequivocal assignment of chromosome breakpoints within these intervals. In this study, we localize rearrangements within duplicon-enriched regions of Angelman/Prader-Willi (AS/PWS) syndrome chromosomal deletions with fluorescence <it>in situ </it>hybridization (FISH).</p> <p>Results</p> <p>Breakage intervals in AS deletions were localized recursively with short, coordinate-defined, single copy (SC) and low copy (LC) genomic FISH probes. These probes were initially coincident with duplicons and regions of previously reported breakage in AS/PWS. Subsequently, probes developed from adjacent genomic intervals more precisely delineated deletion breakage intervals involving genes, pseudogenes and duplicons in 15q11.2q13. The observed variability in the deletion boundaries within previously described Class I and Class II deletion AS samples is related to the local genomic architecture in this chromosomal region.</p> <p>Conclusions</p> <p>Chromosome 15 abnormalities associated with SDs were precisely delineated at a resolution equivalent to genomic Southern analysis. This context-dependent approach can define the boundaries of chromosome rearrangements for other genomic disorders associated with SDs.</p
    • …
    corecore